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ABSTRACT 

Museums hold enormous amounts of information in collections management 
systems and publish academic and scholarly research in print journals, exhibition 
catalogs, virtual museum presentations, and community publications. Much of this rich 
content is unavailable to web search engines or otherwise gets lost in the vastness of 
the World Wide Web. The Open Archives Initiative (OAI) has developed an easily 
implemented protocol to enable data providers to expose their information and service 
providers to access and use it. The CIMI Consortium is working with the OAI to make it 
possible for museums to enhance the availability of their research resources, allowing 
them to be discovered in Web-space by the specialist audiences for which they are 
intended or by service providers who collect, distribute or in other ways provide 
access. By building on the OAI protocol, Dublin Core, and museum community XML 
developments, significant advancements can be made in exposing museum information 
resources. This paper introduces the OAI and its protocol, explores its potential 
relevance to museums, presents CIMI ' s work as an alpha tester of OAI, and looks ahead 
to future developments. (Contains 11 references.) (Author) 
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they are intended or by service providers who collect, distribute or 
in other ways provide access. By building on the OAI protocol, 
Dublin Core, and museum community XML developments, 
significant advancements can be made in exposing museum 
information resources. This paper introduces the OAI and its 
protocol, explores its potential relevance to museums, presents 
CIMI’s work as an alpha tester of OAI, and looks ahead to future 
developments. 



John Perkins, CIMI Consortium, Canada 

Abstract 

Museums hold enormous amounts of information in collections 
management systems and publish academic and scholarly 
research in print journals, exhibition catalogues, virtual museum 
presentations, and community publications. Much of this rich 
content is unavailable to web search engines or otherwise gets 
lost in the vastness of the World Wide Web. The Open Archives 
Initiative (OAI) has developed an easily implemented protocol to 
enable data providers to expose their information and service 
providers to access and use it. The CIMI Consortium is working 
with the OAI to make it possible for museums to enhance the 
availability of their research resources, allowing them to be 
discovered in Web-space by the specialist audiences for which 
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Introduction 
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The ubiquity of the Web and success of popular search engines have 
fueled an expectation for quick, easy, and successful results in the quest 
for information and knowledge. Increasingly, scholars, students and 
other explorers are turning to the Web for their research needs and 
relying less often on traditional research sources. Museums have 
immensely rich information resources in publications, research papers, 
exhibition catalogues, virtual museums, databases, and intranets, but 
access to much information of value about the kinds of materials 
museums hold is rarely available through web search engines. Internet 
search engines only reach static HTML web pages, but much of what 
museums have is opaque to the indexers because it is in databases, 
dynamically generated, or in some other non-HTML form. These 
resources constitute what is becoming known as the hidden Web, 
estimated to contain 400-550 times more content than the commonly 
defined Web. (BrightPlanet 2001) 
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If this problem alone were solved and all the hidden web resources were 
suddenly available for indexing, the difficulty of finding reliable, useful, 
precise information would be seriously compounded, not alleviated. One 
way to address this is through collecting and indexing metadata records, 
rather than indexing the entire contents of HTML pages, thereby 
providing greater possibilities for precision. This is essentially the 
traditional library approach of creating descriptive metadata and building 
union catalogues. However, library catalogues are expensive to maintain 
and in the Web world, both difficult to find and hard to search across. 

As separate approaches, it seems neither the old library methods nor 
the new Internet approach is serving researchers and scholars 
particularly well. (CLIR 2001) 

A particularly promising solution is to explore the utility of combining the 
best of traditional library and museum techniques, such as creating 
descriptive metadata records in catalogues, with the best of new Internet 
techniques like large scale, machine harvesting of information. It is 
possible to consider this because of new developments in Web workable 
technical protocols, the uptake of XML as a way to package and transfer 
information, and the development of international standards for 
describing museum metadata content. 

The Open Archives Initiative 

The Open Archives Initiative, OAI, (http://www.openarchives.org) 
develops and promotes technical protocols and standards, collectively 
called the OAI technical framework, to facilitate access to scholarly 
research information on the Web. It is based on the premise that a 
simple, easily implemented technical framework can allow holders of 
information to create repositories of metadata describing their resources 
that in turn can be harvested and made available for further processing 
or use. (OAI Protocol 2001) 

The OAI technical framework describes how repositories of metadata 
about information resources are constructed. Repositories are 
essentially network accessible servers offered by data providers. A 
repository makes available via a simple protocol records that contain 
metadata about its items (content). A repository may, optionally, 
organize its items into sets corresponding to its collections or other 
groups, thus allowing clients to harvest metadata records selectively. 

A record is an XML encoded byte-stream that serves as a packaging 
mechanism for harvested metadata. The OAI protocol mandates the use 
of unqualified Dublin Core as the common record for discovery. (Dublin 
Core 2001) It also allows community-specific metadata sets described 
by XML SCHEMAS for more detailed description based on the assertion 
that both simple metadata for interoperability and cross-domain 
discovery as well as a method for conveying richer community-specific 
descriptions are needed. 

All OAI repositories must recognize a set of requests or verbs carried in 
http POST or Get methods that allow access to the metadata records. It 
is through these commands that metadata is harvested and transferred. 

One design criteria of the OAI technical framework of particular 
relevance to individual communities such as the museum community is 
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the notion of extension packages. Not only does the protocol allow a 
community to expose its own metadata schema, but it also allows other 
extensions such as unique collection level metadata or, if deemed 
necessary, rights metadata. The OAI protocol doesn't place limits on the 
number of allowable metadata sets, but does specify that their data 
formats be describable by an XML Schema. 

In order to federate distributed repositories, the OAI has established a 
registry service available through the OAI web page to provide a list of 
publicly available repositories and to provide a mechanism for 
conformance testing. (OAI Registry 2001) 

The potential of OAI technical framework is in providing the enabling 
technology for the federating of distributed information resources and 
their discovery and use. The power of the OAI technical framework is in 
its simplicity and ease of implementation. 

Describing Information Resources for Discovery: Dublin 
Core and XML 

While the OAI protocol defines new technical standards for repositories 
and the machine-to-machine dialogue between data providers and 
harvesters, it draws on the established international standard Dublin 
Core for the mandatory metadata record format. (Dublin Core 2001) The 
Dublin Core metadata set was developed specifically to allow a simple 
and easy-to-use description of information resources for their discovery. 
The utility of Dublin Core was corroborated by CIMI in its Dublin Core 
Metadata Testbed that explored the use of unqualified Dublin Core for 
discovery of museum resources, both at a coarse grain level and at a 
more detailed, complex level. At the higher, coarse grain level, the 
Dublin Core is effective both for discovery of resources and as a means 
for museums to interoperate with other communities in a networked 
environment such as the World Wide Web. (CIMI 1999a) 

To go beyond simple discovery and interoperation, the OAI anticipated, 
through inclusion of the extension packages concept, that in addition to 
a core metadata format, individual communities of implementers would 
require additional descriptive formats. Again, this need was borne out in 
the CIMI Dublin Core testbed findings where it was concluded that 
extending the Dublin Core to handle community-specific needs was 
problematic. (CIMI 1999a) 

Alternatives need to be found to extending or qualifying Dublin Core to 
facilitate the more complete descriptions needed by the museum 
community. The OAI addresses this by allowing support for parallel 
metadata sets. For museums, this could conceivably include record 
structures such as SPECTRUM (rich museum object information), CIMI 
(public access), AMICO (art museum images), MIDIS (monuments and 
built environment), OBJECT ID (loss and theft), and RLG Inc.'s CMI 
(Cultural materials). 

The challenge is that each community of OAI implementers must agree 
on what metadata formats are needed beyond the core, and must 
provide XML SCHEMAS for each of them. Once this is accomplished, 
the metadata foundations will be in place for use of the OAI protocol. 

Early in the development of the OAI, CIMI recognized it had a number of 
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features that could help significantly advance access to museum 
information. First and perhaps most importantly, the OAI protocol was 
simple and appeared to be easy to implement using tools and skills 
(Webservers, http, JAVA, PERL, CGI etc.) within the easy reach of 
museums. Secondly, it relied on the Dublin Core as a metadata format 
for the simple discovery of information resources within and between 
communities. This format was proven workable for museums, and there 
exists a guide to best practice for its use. (CIMI 1999b) Finally, the OAI 
mandated XML for packaging richer metadata sets and transferring 
records. XML is a standard that is gaining wide acceptance in museums, 
and XML SCHEMAS exist or are in the process of being created for 
many of the community standards mentioned above. 

CIMI's test of OAI V.1.0 

Because of the perceived potential of OAI for museums, CIMI 
participated as a pre-release tester of the OAI protocol. (OAI Alpha Test 
2001 ) As part of the test, we built a generic OAl-compliant repository. 
(CIMI OAI Repository 2001). The repository architecture shown in Figure 
1 uses a layered approach, standardized APIs, a generic http interface, 
and interchangeable components. This allows implementers the use of 
different back-end databases, webservers, or XML generators and 
minimizes hard-wired coding. 




Figure 1: CIMI OAI Repository Layered Architecture 

The repository took a skilled JAVA programmer two weeks elapsed time 
to build. This period included both an orientation to CIMI and the OAI as 
well as reading and understanding the protocol, and then building the 
application. The development process started with designing a JAVA 
API for the repository and a JAVA servlet to interface between http/OAl 
protocol layers and the repository. The reference repository was written 
using MySQL and JDBC. 

The CIMI reference application serves Dublin Core records from an 
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Apache Webserver generated by the earlier CIM I testbed from the 
MySQL database. Because of the modularity inherent in the 
architecture, the Repository could be layered on top of any ODBC- 
compliant database, be served from other servers, and make use of 
different XML generators. 

Looking Ahead 

The initial evaluation demonstrated that the OAI protocol is indeed 
simple to build. CIMI has limited technical resources and skills but was 
nonetheless able to successfully build an OAI repository that appears to 
be useful. Based on the positive experience as an alpha implementer, 
CIMI plans to continue explorations of the OAI protocol and research its 
use by museums. 

One way is by making the code for the CIMI repository and its 
associated explanatory materials available for downloading from the 
CIMI Website. (CIMI Publications 2001) We hope museums will take 
advantage of its availability to install, experiment with and use the 
protocol. We hope to compile and report the experiences of these ad 
hoc tests. 

CIMI is also interested in conducting a more formal, large-scale test of 
the OAI for museums as a CIMI testbed. As part of this work, we 
propose using OAI V.I.x in combination with scoped extensions and 
other applications necessary for aggregation processes (e.g. editorial 
control, content management and enhancement, registry) to harvest and 
collect museum metadata from cultural memory organizations. It will 
focus on materials that document culture and civilizations, including 
museum objects, art, images, and related materials. We will structure 
this as a CIMI testbed, inviting participation from a group of interested 
members. We expect respondents to include national museum 
organizations, individual museums, commercial enterprises, and 
museum system vendors. Once underway, the project will run 12-18 
months in concert with projects in other communities and the OAI test 
period. 

The purpose of the research is to explore how a specific community of 
users can use the OAI protocol. Part of this is to investigate what 
agreements users need to make within the protocol framework itself 
(e.g. additional metadata sets), and part is to identify any extensions or 
modification required to make the framework additionally useful. Our 
testbed will give museums a place to expose their metadata and 
promote their institutions, test the OAI protocol for utility in describing 
non-bibliographic resources, and could provide a rich resource of 
cultural metadata leading eventually to the materials themselves and the 
institutions offering them. 

It is one thing to test the technical viability of the OAI protocol by 
implementing the protocol at a technical level, but another to imagine 
and determine useful services that might be built on it. We have 
imagined a number of scenarios that could be tested. 

We imagine, for example, that services like AMOL (Australian Museums 
Online), AMICO, the Canadian Digital Museum, or RLG Inc.'s Cultural 
Materials Initiative might want to add a feature to "search for more like 
this" in collections or repositories not under their direct control. We 
imagine that individual museums or groups of museums all using the 
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same collections management system might make use of the repository 
for internal operational needs, for scholarly access, as well as for 
supplementing information services they provide publicly. We imagine 
that commercial services such as AskArt (htt p://ww w.askart.com) - a 
directory of American Artists - or Virtualology 
(http://www.virtualology.com) - a virtual education project - would find 
the resource attractive and useful. We imagine that an easy-to-use 
protocol might be attractive to sales and auction houses, encouraging 
them to make useful research information resources available (such as 
those now manually compiled). We expect national service providers like 
the UK JISC higher education information services to have an interest in 
using museum repositories. We know that the operators of the new 
Internet top-level-domain for museums (MusDoma) are extremely 
interested in providing directory-like services that would include search 
access to our harvested cultural materials metadata. We also imagine 
that harvesting exhibition catalogues and museum publications from 
library catalogues, artist biographies, museological literature from A&l 
services, and sales records from auction houses is of interest to 
museum researchers. These all are the kinds of services that might 
emerge once the OAI is widely deployed in the museum community. 

Regardless of the services developed, there will be a number of issues 
relating to widespread adoption of the OAI protocol in the museum 
community. We foresee a need for our community to test hypotheses, 
assertions, and issues such as: 

• the utility of the Dublin Core for meeting information requirements 
of service providers and consumers; 

• the functionality of the OAI protocol as a basis for a harvesting 
service, including issues of hierarchical descriptions, scalability, 
required extensions, presentation and partitioning; 

• community extensions required for the OAI and DC in order to 
provide useful metadata within and between communities; 

• requirements and practices for content management, metadata 
enhancement, and editorial control 

• aggregation, integration, access and presentation of 
bibliographic, textual, multiple media, and object metadata 

• the need, scope and services of a registry 

• access control and rights 

• mechanisms, processes for paths to underlying content 

• business model for sustainability 

Conclusion 

Both CIMI and many of our members have significant experience in the 
metadata harvesting business. It is this experience that motivates us to 
explore the OAI protocol as an enabling technology to facilitate access 
to resources by making it easier for museums to expose and collect 
metadata. The OAI protocol in concert with a museum testbed seems a 
logical and sensible research initiative that will bring us closer to making 
the rich information resources museums hold more widely available to 
researchers and other users. 
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